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Abstract 

Based on the use of different exponential bases to define class-dependent er¬ 
ror bounds, a new and highly efficient asymmetric boosting scheme, coined 
as AdaBoostDB (Double-Base), is proposed. Supported by a fully theoret¬ 
ical derivation procedure, unlike most of the other approaches in the liter¬ 
ature, our algorithm preserves all the formal guarantees and properties of 
original (cost-insensitive) AdaBoost, similarly to the state-of-the-art Cost- 
Sensitive AdaBoost algorithm. However, the key advantage of AdaBoostDB 
is that our novel derivation scheme enables an extremely efficient conditional 
search procedure, dramatically improving and simplifying the training phase 
of the algorithm. Experiments, both over synthetic and real datasets, re¬ 
veal that AdaBoostDB is able to save over 99% training time with regard 
to Cost-Sensitive AdaBoost, providing the same cost-sensitive results. This 
computational advantage of AdaBoostDB can make a difference in problems 
managing huge pools of weak classihers in which boosting techniques are 
commonly used. 
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1. Introduction 

Boosting algorithms [ij, with AdaBoost as epitome, have been an 
active focus of research since its hrst publication in the 1990s. Its strong 
theoretical guarantees together with promising practical results, including 
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robustness against overfitting and ease of implementation, have drawn the 
attention towards this family of algorithms over the last decade, 0, H S B 
HiB both from the theoretical and practical perspectives. 

A plethora of different applications (medical diagnosis, fraud detection, 
biometrics, disaster prediction...) have implicit classihcation tasks with 
well-dehned costs depending on the different kinds of mistakes in each pos¬ 
sible decision (false positives and false negatives). On the other hand, many 
problems have imbalanced class priors, so one class is extremely more fre- 
junnt or easier to sample than the other one. To deal with such scenarios 


que : 

0 , 


111, classihers must be capable of focusing their attention in the rare 


class, instead of searching hypothesis that, trying to fit well to data, end up 
being driven by the prevalent class. 

Several modihcations of AdaBoost have been proposed in the literature 


to deal with asymmetry [12|, [131, [1^ [8|, [1^, [la, Il7|. In the well-known Vi¬ 


ola and Jones face detector framework, a validation set is used to modify 
the AdaBoost strong classiher threshold a posteriori, in order to adjust false 
positive and detection rates balance. Nevertheless, as the authors stated, it 
is not clear if the selected weak classihers are optimal for the asymmetric goal 
15| nor if these modihcations preserve AdaBoost training and generalization 
errors original guarantees 0 ]- The vast majority of other proposed methods 
3 0, M, 0, [l6 try to cope with asymmetry through direct manipulations 
of the weight updating rule. These proposals, not being a full reformulation 


of the algorithm for asymmetric scenarios, have been analyzed [18|, Il7[ to 


be heuristic modihcations of AdaBoost. However, two recent contributions 
have been proposed to deal with the asymmetric boosting problem in a fully 
theoretical way: On the one hand, the Cost-Sensitive Boosting framework by 
H. Masnadi-Shirazi and N. Vasconcelos 0 drives to an algorithm far more 
complex and computationally demanding that the original (symmetric) Ada¬ 
Boost but with strong theoretical guarantees. And on the other hand, the 
class-conditional description of AdaBoost by I. Landesa-Vazquez and J.L. 


Alba-Castro jl9|, demonstrates that asymmetric weight initialization is also 


an ehective and theoretically sound way to reach boosted cost-sensitive clas¬ 
sihers. These two theoretical alternatives follow diherent “asymmetrizing” 
perspectives and drive to diherent solutions. 

In this work we will follow an approach closer to the Cost-Sensitive Boost¬ 
ing framework 17|. Though sharing^uivalent theoretical roots and guaran¬ 
tees with Cost-Sensitive AdaBoost [l7|, our proposal entails a new and self- 
contained analytical framework leading to a novel asymmetric boosting algo- 
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rithm which we call AdaBoostDB (from AdaBoost with Double-Base). Our 
approach is based on three distincti ve p remises: its derivation is inspired by 
the generalized boosting framework [20| (unlike the Statistical View of Boost¬ 
ing followed by 0 ) , its error bound is modeled in terms of class-conditional 
(double) exponential bases, and two parallel class-conditional weight subdis¬ 
tributions are used and updated during the boosting iterations. As a result, 
from a different (thought theoretically equivalent) perspective, and following 
a completely different derivation path, we reach an algorithm able to hnd the 
same solution as Cost-Sensitive AdaBoost, but in a much more efficient way. 
Indeed, our approach gives rise to a more tractable mathematical model and 
enables a searching scheme that dramatically reduces the number of weak 
classihers to be evaluated in each iteration. 

The paper is organized as follows: In the next section we describe Ada¬ 
Boost original algorithm and the way asymmetric variations have been pro¬ 
posed in the literature, paying special attention to the Cost-Sensitive Ada¬ 
Boost algorithm 0 , In Section |3] AdaBoostDB and the Conditional Search 
method are derived, explained and discussed. In Section H] all the empirical 
framework and experiments are shown. Finally, Section O includes the main 
ideas, conclusions and future research lines drawn from our work. 


2. AdaBoost and Cost 
2.1. AdaBoost 

Given a space of feature vectors X and two possible class labels y G 
— 1}, AdaBoost goal is to learn a strong classiher Ff(x) as a weighted 
ensemble of weak classihers ht(x) predicting the label of any instance x G X. 


H{x) = sign(/(x)) 


sign 


^atht(x) 


( 1 ) 


From a training set of n examples Xj, each of them labeled as positive 
iyi = 1) or negative {yi = —1), and a weight distribution Dt{i) dehned 
over them for each learning round t, the weak learner must select the best 
classiher ht{x) according to the labels and weights. Once a weak classiher 
is selected, it is added to the ensemble modulated by a goodness parameter 
at ([2]), correspondingly updating the weight distribution. Weak hypothesis 
search is guided to maximize goodness at, which is equivalent to maximize 
weighted correlation between labels (j/j) and predictions (ht). This procedure 
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can be repeated in an iterative way nntil a predefined nnmber T of training 
ronnds have been completed or some performance goal is reached: 


' 2 Vl-Er=iA(%A(a;.); 


( 2 ) 


Dt{i)eyi^{-atyiht{,Xi)) 

YTi=i A(*) exp {-atyiht{xi)) 

Dt{i)e^Y>{-Oityiht{xi)) 

Zt 


(3) 


This scheme can be derived Q as a ronnd-by-ronnd (additive) minimiza¬ 
tion of an exponential bonnd on the strong classifier training error, coming 
from the next ineqnality: 


H{xi) ^yi ^ yif{xi) < 0 ^ e > 1 (4) 

From now on, as in many other stndies, we will focns on the discrete 
version of AdaBoost for a simpler and more intnitive analysis (which does 
not prevent onr derivations from being also applied to other variations of 
the algorithm). In this case weak hypothesis are binary j/j G { —1,-t-l}, so 
parameter at can be rewritten ([S]) in terms of the weighted error et dSD and 
the weak hypothesis is eqnivalent to finding the classifier with smaller St- 



(5) 


= X] Dt{i) IKxi) ^ ^ Dt{i) (6) 

i=l nok 

As can be seen, we will follow notation from j^, where operator |a] is 1 
when a is trne and 0 otherwise. In addition, for the sake of simplicity, we 
will use the term ‘ok’ to refer to those training examples in which the result 
of the weak classifier is right {i : h{xi) = Ui] and ‘nok’ when it is wrong 
{i : h{xi) ^ Vi). 
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2.2. Cost-Sensitive AdaBoost 


As was initially defined, the exponential error bound does not have any 
direct class-dependent behavior, so several modihcations of AdaBoost have 
been proposed in the literature to enhance this seemingly symmetric nature. 


Most of the proposed variations [12|, [iSl, llj, lla, are based on directly 


modifying the weight update rule in an asymmetric (class-conditional) way. 
However, since the update rule is a consequence of the error bound mini¬ 
mization process, the way these changes are really affecting the theoretical 
properties and optimality of AdaBoost cannot be guaranteed. 

Considering those previous variations as heuristic, Masnadi-Shirazi and 
Vasconcelos [l7| proposed a theoretically sound approach based on the Sta¬ 
tistical View of Boosting. According to this interpretation boosting al¬ 
gorithms can be seen as round-by-round estimations building an additive 
logistic regression model, and the exponential error bound can be modeled 
as the minimization of the next expression, where E means expectation: 


J(/) = E (7) 

Setting the derivative dJ{f)/df{x) to zero, we can obtain the solution of 
the minimization problem as the weighted logistic transform of P{y = l|a;) 




1 P {y = l\x) 
2^^^ P {y =-l\x) 


( 8 ) 


Following this perspective, Masnadi-Shirazi and Vasconcelos adapted it 
to the cost-sensitive case 


J(/) = E{ly = (9) 


1 . CpP{y = l\x) 

Cp + Cn °®C'jvP(2/ = -l|a;) 

where Cp and Cpf denote the misclassihcation costs for positives and 
negatives. The result of their derivation is the Cost-Sensitive AdaBoost al¬ 
gorithm we can see in algorithm [H 

It is important to note that, for the sake of homogeneity and simplicity, we 
have kept and followed the original notation by Schapire and Singer along 
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Algorithm 1 Cost-Sensitive AdaBoost 

Input: 

Training set of n examples: where yi = 

Pool of F weak classifiers: h^(x) 

Cost parameters: Cp, Cn 
Number of rounds: T 


r 1 if 1 < z < m, 
^ —1 if m < 2 < n. 


Initialize: 

Uniform distribution of weights for each class: D{i) = 
for t = 1 to T do 


1 

2m ^ 

2(n — m) 


if 1 < 2 < m, 
if m < 2 < n. 


Calculate parameters: 

Tp = Elii D(i) 

Tn = Er=™+1 D{i) 

for / = 1 to F do 

Pick up weak classifier: hf{x). 

Calculate parameters: 

Find atj solving the next hyperbolic equation: 

2CpBcosh (Cpatj) + 2CjvX>cosh {CNOitj) = CiTpe-^P°‘^’f + C2TNe~^^°‘^’f 

Compute the loss of the weak learner 

Ltj = B + Tpe~‘^P‘^*'f + V 

end for 

Select the weak learner (/ 2 t(x), at{x)) of smallest loss in this round: argmin 

/ 

Update weights: 

i D{i)e^N°‘tht(:x-i) ii m < i < n. 

end for 


Final Classifier: 

H(x) = sign (eLi «t^t(x)) 


the entire paper. Becanse of this, we have had to adapt the notation nsed in 
0 to this format. As well as in that work, we have also particnlarized onr 
analysis to the most common case of having an initial pool of weak classifiers. 

3. AdaBoostDB 

Following the analytical gnidelines proposed by Schapire and Singer ji|, 
in this section we will present and theoretically derive onr asymmetric gen¬ 
eralization of AdaBoost, AdaBoostDB, based on modifying the nsnal cost- 
insensitive exponential error bonnd with class-dependent bases. 
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3.1. Double-Base Error Bound 

Based on the inequality in equation (|1]), the original AdaBoost formula¬ 
tion is geared to minimize an exponential error bound Et over the weighted 
training error E^ (HU). For minimization purposes, the specihc exponential 
base /9 is irrelevant whenever > 1 so, for simplicity, the selected base in 
the classical formulation of AdaBoost is /9 = e. 

n n 

Et = # »,] < 5] (11) 

i=l i=l 

If we suppose, without loss of generality, that our training set is divided 
into two meaningful subsets (the hrst m examples, positives, and the rest, 
negatives) we can dehne exponential bounds with different bases for each one. 
Calling (3p and /3n to these bases, the decomposed exponential bound Ep can 
be expressed as equation ([12]). We will assume, without loss of generality, 
that (3p > (3 n > 1 . 


i=l 2=m+l 

This base-dependent behavior can be graphically analyzed in Figure [1) 
the greater one base is respect to the other, the more penalized are its re¬ 
spective errors. Therefore, associating fdp to positive examples subclass and 
/5jv to the negative ones, this imbalanced behavior can be directly mapped 
to a class imbalanced cost-sensitive approach. 

Rewriting the expression of Ep fflU in terms of asymmetric exponents 
(ITU , this double-base perspective can be immediately linked with the Cost- 
Sensitive Boosting framework: both approaches are equivalently parame¬ 
terized by class-conditional costs {Cp = log (/dp) and Cjq = log (/Sat), for 
positives and negatives respectively) and have the same statistical meaning 

B. 




,-log/3ivyi/(xi) 


2=1 

m 


n 


(13) 




^-CpVifix. 


•>+ 5 ^ 


-CNVifixi) 


2=1 


i=m+l 
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Figure 1: Misclassification and AdaBoost exponential training error bounds with different 
bases. The final score of the strong classifier is represented in the horizontal axis (negative 
sign for errors and positive for correct classifications), while vertical axis is the loss related 
to each possible score. 


3.2. Algorithm Derivation 

As we have just seen, the double-base approach shares with Cost-Sensitive 
AdaBoost a common theoretical root. However, our change in the point of 
view, along with a derivation inspired in the original framework by Schapire 
and Singer (instead of the Statistical View of Boosting used to derive Cost- 
Sensitive Boosting), will allow us to follow a different derivation pathway, 
ending in a much more efficient formulation. 

Let us suppose, again, that the hrst m examples of the training set are 
positives and the rest are negatives, so the base-dependent behavior results 
in a class-dependent one. In this case, we can also split the initial weight 
distribution Di into two class-dependent subdistributions, Dpi and D^^i, for 
positives and negatives respectively: 









DN,i{i) = 


D.it) 


DS) 


Er=™+iA(^)’ 


for i = 1,..., m 
for i = m + 1,..., n 


Defining the global weight of each class, Wp and Wp[ as follows, 


(14) 

(15) 


Wp = Y,Di{i) 

i=\ 

n 

Wn= Y. 


(16) 

(17) 


the error bonnd Ep can be decomposed into two class-dependent bounds 
Ep^T and Epi^p. 


Et = + Wn Y = WpEp^t + WNEN,t 

i=l i=m-\-l 

(18) 

Both error components are formally identical to the original bound (ex¬ 
cept for the weight distributions) allowing us to directly insert different ex¬ 
ponential bases for each of them. This is just what we wanted. As in the 
original AdaBoost formulation, initial weight subdistributions can be extrap¬ 
olated to round-by-round ones {Dpt and being iteratively updated and 

normalized in an analogous waj0. 


Dp,{t+i){i) 




Zp^t 


(19) 


Two new parameters can also be dehned as accumulators of the 

training behavior over each class until round t fl20|) . Their dehnition can be 


^For shortness we will only show equations (IT^ . ([20|) and (EID for the positive class 
case. The negative ones are completely analogous to them. 
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obtained by unraveling the weight update rule, and allows us to decouple 
each class error bound into two factors HU: one only depending on the 
previous rounds Apjisi^t-i and other depending on the performance of the 
current round (with an homologous meaning to Zt in the original 

AdaBoost formulation). 

t m 

Ap,t = l[Zpk = ( 20 ) 

k=l i=l 

m m 

Ep^t = Dp,= Ap,*_i = Ap,,_iZp,, 

i=l i=l 

( 21 ) 

As a consequence, the total error to minimize {Et) can be expressed as 

(El. 


Et — WpAp^t-iZp^t + 'W'NAp[^t-iZN,t 

m n 

2=1 2=m+l 

( 22 ) 

Due to the convexity of exponential functions, the minimum of this bound 
Ef can be analytically found by canceling its derivative. Dehning the cost 
parameters as commented in the previous section {Cp = log(/9p) and Cjsi = 
log(/5jv)), and bearing in mind equation fl23|ll^ . the goal derivative can be 
expressed as (l2lD . 


p-atyiht(xi) _ 1 A UihtiXi) f^-at ^ VihliXi) 


(23) 


^Equation (125)) is strictly true for the discrete case, when weak hypothesis are 1 or -1. 
However, if weak hypothesis were real in the range [—1,1], this equation would transform 
in an upper bound as explained in Q. In that case we would be minimizing an upper 
bound on Et instead of Et directly, which is the same behavior as in the original AdaBoost 
with real-valued weak predictors. 


10 





-at 


dEt 

dat 


CpWpAp^t^i Dp^t{i)(3p^* — CpWpAp^t_i Dp^t{i)(3p 

Pos nok Pos ok 

+ CNWNApi^t-i — C^WnAj^^i-i DN^t{i)f3N “* 

Neg nok Neg ok 

(24) 


0 


Since Cp and Cat do not have to be integer values in general, the real 
asymmetry only relies on their relative magnitudes (how much a positive 
costs over a negative), so we will always hnd equivalent integer values to 
play this role whatever the desired asymmetry is. 

At this point, with at as unknown variable, the minimization equation 
can be modeled as a polynomial fl30l) by making a change of variable (|2^ and 
rewriting it in terms of parameters (l26l im [28l l2^ . instead of the hyperbolic 
model used in l3 . 


a; = e"* 

CpWpAp^t-i 

CpWpApt-i + C]\rWNA]\r^t-i 

CNWp^Ap^^t-i 

CpWpAp^t-i + CpiWNApi^t-i 
£p,t = ^ Dp^t{i) 

Pos nok 

£N,t = ^ 

Neg nok 


(25) 

(26) 

(27) 

(28) 

(29) 


a ■ Epf ■ x^’’ + b ■ Ejvj ■ - 6 (1 - £«_,) I®' - o (1 - Epj) = 0 (30) 

The latter equation, where x is the independent variable, has in general 
2Cp possible solutions, from which, by the nature of the problem, we are only 
interested in those real and positive. It is easy to see that a, b, sp^t and eN,t 
are, by definition, all real values in the [0,1] interval. As a consequence, there 
is only one sign change between consecutive coefficients of the polynomial, 
and by the Descartes’ Rule of Signs we can ensure that the equation has only 
one real and positive solution which is our solution. 
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The straightforward way to solve the posed problem is calculating the 
zeros of the polynomial to hnally keep the only real and positive root. This 
process should be repeated for all the possible weak hypothesis in order to 
hnally select that leading to the greatest goodness at = log^Xroot), that is, the 
one with the greatest root. This direct mechanism, requiring a scalar search, 
is very similar to that proposed in Cost-Sensitive AdaBoost but with the 
computational advantage of evaluating a polynomial instead of a hyperbolic 
function. 

3.3. Conditional Search 

The main drawback of the straightforward solution in Section 13.21 is that 
it still requires the search of the associated root for every classiher in ev¬ 
ery boosting round. This could be very expensive in computational burden 
terms, for example, in applications needing to select from hundreds of thou¬ 
sands different classihers evaluated over several thousands of training exam¬ 
ples such as computer vision algorithms j^. Nevertheless, a slight change in 
the point of view can serve to drastically reduce this computational burden. 
If we dehne functions V{x) and S{x) as follows, we can rewrite equation (l30D 
as S{x) = V{x). 


S{x) = a + b ■ x^^ 

V{x) = a - Ep^t {x^^^ -h l) & • eN,t {x 


(31) 

) (32) 



The hrst function S{x) is a polynomial whose coefficients are parameters 
a and b, which only depend on the previous boosting rounds. The second one, 
V (x), has coefficients also depending on ep^t and eso it has a dependence 
with the current round as well. As a result, the minimization procedure of a 
given round can be modeled as the crossing point between a static function 
S{x), fixed for the current round, and a variable function V{x). 

It is important also to bear in mind some specificities (the problem is 
graphically shown in Figure [2]): 

• By definition all parameters a, b, ep^t and Eqq^t are positives, so both 
functions are increasing for x > 0. 

• The crossing points with the y-axis are (0, a ■ Ep^t) for V{x), and (0, a) 
for S{x). Taking into account that < 1 we have 1^(0) < S'(O). 
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• When a; ^ cxo, y(a:) > S{x). 

• There is only one positive crossing point. 



Figure 2: (a) Crossing point scenario for the static S{x) and variable V{x) functions (a) 
modeling the minimization problem. Graphical representation of the Contribution (not 
fulfilled) (b) and Improvement (c) conditions. 


Descartes’ rule of signs ensures us the existence of one crossing point, but 
only solutions satisfying x > 1 are interesting for the classihcation problem: 
only weak hypothesis with some goodness, i.e. at > 0, are really contributing 
for the strong classiher. This Contribution Condition can be formalized as 
follows dHH]), and any weak classifier that does not meet this requirement 
should be directly discarded for the current round without more computation. 

1/(1) < ^(1) = 1 ^ a-ep + fe-SAT < ^ (33) 

On the other hand, once we have computed a valid solution, to compar¬ 
atively evaluate any other candidate we just need to know if its related root 
(i.e. its goodness at) is greater to the one we already have. Using this infor¬ 
mation, we would only have to effectively calculate the specihc root (i.e. run 
the scalar search) for those weak classifiers with greater roots, directly reject¬ 
ing the other ones. Bearing in mind that both V{x) and S{x) are increasing 
functions, given two possible weak classihers with their associated functions 
Vi{x) and V 2 {x) and the solution x\ for the hrst of them, the second classiher 
will only be better than the hrst one if Vi(xi) > V 2 {xi). We will call this 
rule as the Improvement Condition. 
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Applying both conditions to the weak hypothesis searching process in a 
nested way, the average number of zeros effectively computed decreases over 
99.5% with respect to the straightforward solution, while consuming only 
0.41% of its time (more details in section |4. 4. 2p . It is important to emphasize 
that this improved searching technique, which we have coined as Conditional 
Search, and the huge computational saving it brings, is made possible by the 
polynomial and double-base modeling of the proposed framework. 

A compact summary (for a direct implementation) of the hnal version of 
AdaBoostDB algorithm, including the Conditional Search, is given in Algo¬ 
rithm [21 


4. Experiments 

To show and assess the performance of AdaBoostDB in practical terms we 
have conducted a series of empirical experiments to analyze the asymmetric 
behavior of the algorithm, comparing it with theoretical optimal classihers 
and with Cost-Sensitive AdaBoost using synthetic and real datasets. 


4 . 1 . Experimental Framework 

Cost-sensitive classihcation problems can be totally parameterized in terms 
of a cost matrix [ 1 ^ , whose components are the costs related to each possible 
decision. For a two-class problem this matrix can be expressed as follows: 


Negative Positive 

Classihed as Negative / Cnn Cnp \ (34) 

Classihed as Positive \ Cpn Cpp ) 

In detection problems costs related to good decisions are considered null 
(cnn = Cpp = 0 ), so the cost matrix is only dependent on the two error- 
related parameters, Cnp and Cp„, which are directly assimilable to Cp and 
Ctv in our previous theoretical analysis. Bearing in mind that the optimal 
decision is unchanged when the cost matrix is multiplied by a constant, the 
resulting matrix actually has only one degree of freedom, which we will call 
the asymmetry ( 7 ) of the problem. 


7 = 


■^np 


Cnp T Cpn 


Cp 

Cp + Cn 


(35) 


The traditional way to evaluate and compare the behavior of different 
classihers across different working points has been based on the analysis of 
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ROC curves j2l|, l22(]. Nevertheless, an alternative representation proposed by 


C. Drummond and R.C. Holte |23|, dual respect to traditional ROC curves 


and based on expected costs, has been shown to be more appropriate for 
cost-sensitive classihcation problems (cost is explicitly presented, enabling 
direct visual interpretations and comparisons). Our experimental analysis is 
based on these representations. 

Following guidelines in j^, the Probability Cost Function (PCF) and the 
Normalized Expected Cost (NEC) are dehned in equations [36] and [ST] where 
p{+) and p{—) are the prior probabilities of an example to be positive or 
negative, while FN and FP are, respectively, the false negatives and false 
positives rates obtained by the classiher. 


PCF -- 
NEC 


p{+)Cp 

p{+)Cp + p{—)Cj\f 

-- FN ■ PCF + FP 


(36) 

(37) 


4-2. Bayes Error Rates 

As hrst step, we are going to compare AdaBoostDB classihers with their 
optimal Bayes classihers counterparts for different cost combinations. To 
this end, we have dehned a synthetic dataset scenario from which we can 
easily calculate the theoretically optimal classiher following the Bayes Risk 
Rule. This synthetic scenario is illustrated in Figure[3j Two bivariate normal 
point clouds, one for positives and one for negatives, with the same priors and 


variances but diherent means. As customary in many boosting works |4j.l8l.ll7 


weak learners are stumps (the quintessential weak classiher) computed in this 
case, over the projection of the points on a discrete range of angles in the 2D 
space (Figure |3 ]d). 

Two diherent random datasets were generated, one for training and the 
other one for test. Nineteen diherent asymmetries to evaluate have also been 
dehned, trying to sweep a wide range of cost combinations: 


{Cp, Cr,) e {(1,100), (1, 50), (1, 25), (1,10), (1, 7), (1, 5), (1, 3), (1, 2), (2, 3), (1,1), 
(3, 2), (2,1), (3,1), (5,1), (7,1), (10,1), (25,1), (50,1), (100,1)} 

(38) 

Therefore, 19 diherent AdaBoostDB classihers were trained to be com¬ 
pared with their respective optimal Bayes classihers counterparts, over the 
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(a) 


(b) 


Figure 3: Bayes Risk datasets used in our experiments: Positive examples are marked as 
‘+’, while negatives are ‘o’. In figure b examples of weak classifiers are shown. 


same test set. In addition, another 19 classifiers nsing Cost-Sensitive Ada- 
Boost have also been trained as a preliminary comparative between this 
algorithm and AdaBoostDB (we will delve in this issne in section 03]). 


The goal of our first comparative test (also based on [23|) is to com¬ 


pute the lower envelope of each set of classifiers (Bayes, AdaBoostDB and 
Cost-Sensitive AdaBoost) in the cost space. This cost space is defined by 
the relationship between Probability Cost Function (x-axis) and Normalized 
Expected Cost (y-axis). In this framework, every classifier, though trained 
for a specific asymmetry, can be tested in arbitrary cost scenarios (different 
asymmetries for the same test set) thus drawing a line passing by (0, FP) 
and {1,FN) in the cost space. As a result, each family of classifiers will be 
represented by a collection of lines whose lower envelope defines the mini¬ 
mum cost classifier along the operating range (see Figure 0]). Comparing the 
three resulting lower envelopes (Figure 0]:;) we can appreciate an equivalent 
behavior with only slight differences among them. 

The second comparative test is among the same classifiers when tested 
for the specific asymmetry they were trained for. Results can be seen in Fig¬ 
ure 01 AdaboostDB performance follows the trend set by the Bayes optimal 
classifier, describing a consistent and gradual asymmetric behavior across 
the different costs and all the studied parameters (false positives, false neg¬ 
atives, classification error and normalized expected cost). Moreover, as we 
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(a) (b) 




(c) {d) 

Figure 4: Lower envelope graphic representations for the three classifiers families: (a) 
Bayes, (b) AdaBoostDB and (c) Cost-Sensitive AdaBoost. Figure (c) shows the three 
lower envelopes superimposed. 


will comment in section 031 the behavior of AdaBoostDB and Cost-Sensitive 
AdaBoost is virtually the same. 


4-3. Asymmetric behavior 

Now the goal is to test the asymmetric behavior of AdaBoostDB over 
heterogeneous classihcation problems, using synthetic and real datasets and 
different cost requirements. 

Synthetic datasets: In addition to the dataset used in the last section 
called as “Bayes” dataset), we will also use a two cloud scenario inspired by 
fil l , in which positives and negatives are uniformly distributed in overlapping 
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Figure 5: Performance comparison of classifiers obtained by AdaBoostDB, Bayes and 
Cost-Sensitive AdaBoost for each specific asymmetry over the Bayes synthetic test set. 
(a) False Positives, (b) False Negatives, (c) Classification Error, (d) Normalized Expected 
Cost. 


circular/annular regions with different centroids (see Figure E])- Features are 
again the projections of the examples over a discrete range of angles in the 
2D space. 

Real datasets: We selected several datasets, asymmetric on their own 
dehnition, from UCl Machine Learning Repository 2^ (Credit, Ionosphere, 
Diabetes and Spam). We have considered as positives the more valuable 
classes according to the original problems. 

In both synthetic and real cases, weak learners are stumps. For every 
dataset and every cost requirement, we have followed a 3-fold cross-validation 
strategy to evaluate the asymmetric performance: the whole dataset is di- 
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Figure 6: Two Clouds dataset used in our experiments. Positive examples are marked as 
‘+’, while ‘o’ are the negative ones (note that positive and negative classes are overlapped 
in both cases). 


vided in three subsets, leaving iteratively one of them as test set and the other 
two forming the training set. As a result, for every dataset-cost combination, 
we can obtain the performance averages of the three classihers. 

Obtained results are shown in Table [U As expected, when positives be¬ 
come more costly than negatives, false negative rates (FN, error in positives) 
tend to decrease while false positives rates (FP, error in negatives) tend 
to increase. In the opposite situation (when negatives become more costly 
than positives) roles are accordingly exchanged, showing a progressive and 
consistent asymmetric behavior, generalized across all the datasets and cost 
combinations. Information in the table is supplemented with two global per¬ 
formance measures, Classihcation Error (CE) and Normalized Expected Cost 
(NEC), of each experiment. 

4.4- AdaBoostDB vs. Cost-Sensitive AdaBoost 

As explained in section |3l AdaBoostDB and Cost-Sensitive AdaBoost 
share a common theoretical root, but differ in the way they model and de¬ 
rive that equivalent starting point. As a result, both frameworks give rise 
to different algorithms that must obtain the same solution for a given prob¬ 
lem. This scenario has two consequences: on the one side, though classihers 
obtained by both algorithms should be theoretically identical when trained 
in the same conditions, in practice numerical errors can make them differ. 
On the other side, the polynomial model and Conditional Search mechanism 
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Table 1: AdaBoostDB asymmetric behavior (false negatives, false positives, classification 
error and normalized expected cost) for each cost combination over the synthetic and UCI 
datasets. _ 


Cost 

Bayes 

TwoClouds 1 

FN 

FP 

CE 

NEC 

FN 

FP 

CE 

NEC 

[1,100] 

2.13'10-^ 

3.2110-11 

1.22-10-1 

3.3940-1 

9.12-10-1 

6.02-10"^ 

4.59-10-' 

1.5040-'' 

[1,50] 

1.7310-1 

4.0210-11 

1.0640-1 

4.2840-1 

9.12-10-1 

6.02-10"^ 

4.59-10-1 

2.3840-^ 

[1,25] 

1.7310-1 

3.2110-^ 

1.02-10-1 

3.7540-1 

9.12-10-1 

6.02-10-2 

4.59-10-1 

4.0940-^ 

[1,10] 

1.3710-1 

4.4210-^ 

9.04-10-11 

5.2640-1 

8.4940-1 

6.02-10-2 

4.28-10-1 

8.2740-^ 

[1,7] 

1.3710-1 

4.0210-11 

8.84-10-11 

5.22-10-1 

7.85 40-1 

2.21-10-2 

4.04-10-1 

14740-' 

[1,5] 

1.2910-1 

4.0210-11 

8.4340-11 

5.4940-1 

7.35 40-1 

2.21-10-2 

3.79-10-1 

1.4110-' 

[1,3] 

1.2040-1 

3.6110-^ 

7.8340-11 

5.72.10-1 

7.4340-1 

3.82-10-2 

3.91-10-1 

2.1440-' 

[1,2] 

1.29 10-1 

3.6110-11 

8.2340-11 

6.6940-1 

5.9640-1 

9.04-10-2 

3.43-10-1 

2.5940-' 

[2,3] 

1.0410-1 

4.8210-^ 

7.6340-11 

7.0740-1 

4.7840-1 

1.81-10-1 

3.29-10-1 

3.0010-' 

[1,1] 

5.62.10-11 

7.6310-11 

6.6340-11 

6.6340-1 

3.9210-1 

2.93-10-1 

3.42-10-1 

3.4240-' 

[3,2] 

6.02-10-11 

7.6340-^ 

6.8340-11 

6.6740-1 

2.2340-1 

4.08-10-1 

3.15-10-1 

2.9740-' 

[2,1] 

3.61 lO-n 

8.4310-11 

6.02-10-11 

5.22-10-1 

1.24-10-1 

5.48-10-1 

3.36-10-1 

2.6640-' 

[3,1] 

4.42-10-11 

8.84.10-^ 

6.63 lO-n 

5.52.10-1 

4.82-10-1 

6.53-10-1 

3.50-10-1 

1.9910-' 

[5,1] 

5.62.10-11 

7.6310-11 

6.63 lO-n 

5.9640-1 

1.0040-1 

8.13-10-1 

4.12-10-1 

1.4440-' 

[7,1] 

4.82-10-11 

1.0440-1 

7.6340-11 

5.52.10-1 

1.2040-1 

8.61-10-1 

4.37-10-1 

1.1840-' 

[10,1] 

4.42-10-^ 

1.2410-1 

8.4340-11 

5.1540-1 

1.0040-1 

8.73-10-1 

4.42-10-1 

8.8540-" 

[25,1] 

3.2110-^ 

2.0510-1 

1.1840-1 

3.8840-1 

1.0040-1 

9.48-10-1 

4.79-10-1 

4.6140-" 

[50,1] 

2.8140-^ 

1.8110-1 

1.04-10-1 

3.1110-1 

1.0040-1 

9.48-10-1 

4.79-10-1 

2.8440-" 

[100,1] 

2.8140-11 

1.8510-1 

1.0640-1 

2.9740-1 

1.0040-1 

9.48-10-1 

4.79-10-1 

1.9340-" 


Cost 

Credit 

Ionosphere | 

FN 

FP 

CE 

NEC 

FN 

FP 

CE 

NEC 

[1,100] 

9.9740-' 

1.4310-' 

3.00-10-' 

1.1340-" 

8.8440“' 

2.38-10-^ 

5.75-10-' 

3.2310“" 

[1,50] 

9.9740-' 

1.4310-' 

3.00-10-^ 

2.0940“" 

8.9340“' 

1.59-10-^ 

5.78-10-' 

3.3110“" 

[1,25] 

9.9340-' 

1.4310-' 

2.99-10-' 

3.9640“" 

5.5140“' 

8.73-10-2 

3.85-10-' 

1.0510“' 

[1,10] 

9.4040-' 

5.7210-' 

2.86-10-' 

9.0740“" 

4.4440“' 

8.73-10-2 

3.16-10-' 

1.2010“' 

[1,7] 

8.9740-' 

1.7210-" 

2.81-10-' 

1.2740“' 

3.42 40“' 

1.27-10-' 

2.65-10-' 

1.54.10“' 

[1,5] 

8.4340-' 

3.0040-" 

2.74-10-' 

1.6610“' 

2.6740“' 

1.51-10-' 

2.25-10-' 

1.7010“' 

[1,3] 

6.6740-' 

8.7340-" 

2.61-10-' 

2.3240“' 

2.3640“' 

2.46-10-' 

2.39-10-' 

2.4340“' 

[1,2] 

5.0310-' 

1.2310-' 

2.37-10-' 

2.5040“' 

1.2910“' 

2.46-10-' 

1.71-10-' 

2.0740“' 

[2,3] 

4.1740-' 

2.0210-' 

2.66-10-' 

2.8840“' 

1.6910“' 

2.38-10-' 

1.94-10-' 

2.1040“' 

[1,1] 

2.6040-' 

2.9010-' 

2.81-10-' 

2.7540“' 

3.5640“" 

3.02-10-' 

1.31-10-' 

1.6910“' 

[3,2] 

1.7740-' 

4.0310-' 

3.35-10-' 

2.6740“' 

6.6740“" 

3.65-10-' 

1.74-10-' 

1.8640“' 

[2,1] 

1.4740-' 

4.3810-' 

3.50-10-' 

2.4440“' 

5.3340“" 

3.17-10-' 

1.48-10-' 

1.41.10“' 

[3,1] 

1.2010-' 

5.2940-' 

4.06-10-' 

2.2240“' 

5.3340“" 

3.49-10-' 

1.60-10-' 

1.2740“' 

[5,1] 

7.3340-" 

6.7410-' 

4.93-10-' 

1.7310“' 

3.5640“" 

3.10-10-' 

1.34-10-' 

8.12.10“" 

[7,1] 

4.6740-" 

7.3210-' 

5.27-10-' 

1.3210“' 

2.6740“" 

3.57-10-' 

1.45-10-' 

6.8010“" 

[10,1] 

2.3340-" 

8.1840-' 

5.80-10-' 

9.5640“" 

3.5640“" 

3.73-10-' 

1.57-10-' 

6.62.10“" 

[25.1] 

3.3340-' 

9.2810-' 

6.51-10-' 

3.8940“" 

4.0040“" 

3.97-10-' 

1.68-10-' 

5.3740“" 

[50.1] 

0 

9.6740-' 

6.77-10-' 

1.9010“" 

4.4440“" 

4.05-10-' 

1.74-10-' 

5.1540“" 

[100,1] 

0 

9.8740-' 

6.91-10-' 

9.7740“' 

4.4440“" 

3.89-10-' 

1.68-10-' 

4.7940“" 


Cost 

Diabetes 

Spam 1 

FN 

FP 

CE 

NEC 

FN 

FP 

CE 

NEC 

[1,100] 

9.8140“' 

4.0210“' 

3.45-10-' 

1.3710“" 

3.8940“' 

1.3840“" 

2.41-10“' 

1.75-10“" 

[1,50] 

9.5940“' 

6.0210“' 

3.39-10-' 

2.4740“" 

3.4640“' 

1.9940“" 

2.17-10“' 

2.63-10“" 

[1,25] 

9.0310“' 

1.6110“" 

3.25-10-' 

5.02.10“" 

2.74.10“' 

2.7640“" 

1.77-10“' 

3.71-10“" 

[1,10] 

7.8740“' 

3.4110“" 

2.97-10-' 

1.0310“' 

1.92.10“' 

4.42-10“" 

1.34-10“' 

5.76-10“" 

[1,7] 

6.52.10“' 

4.0210“" 

2.54-10-' 

1.1740“' 

1.6910“' 

4.7540“" 

1.21-10“' 

6.27-10“" 

[1,5] 

6.4840“' 

4.2210“" 

2.54-10-' 

1.4310“' 

1.5610“' 

5.41-10“" 

1.16-10“' 

7.10-10“" 

[1,3] 

5.62.10“' 

6.6310“" 

2.39-10-' 

1.9010“' 

1.2310“' 

6.4640“" 

9.98-10“" 

7.91-10“" 

[1,2] 

4.7940“' 

1.2010“' 

2.46-10-' 

2.4040“' 

1.05 40“' 

6.7340“" 

9.00-10“" 

7.98-10“" 

[2,3] 

3.5640“' 

1.9910“' 

2.54-10-' 

2.62.10“' 

8.9340“" 

6.84-10“" 

8.11-10“" 

7.68-10“" 

[1,1] 

3.0310“' 

2.3110“' 

2.56-10-' 

2.6740“' 

7.9340“" 

8.2840“" 

8.07-10“" 

8.10-10“" 

[3,2] 

2.4040“' 

3.03 10“' 

2.81-10-' 

2.6540“' 

7.1840“" 

9.1140“" 

7.94-10“" 

7.95-10“" 

[2,1] 

1.5740“' 

3.7110“' 

2.97-10-' 

2.2940“' 

6.2840“" 

9.3310“" 

7.48-10“" 

7.30-10“" 

[3,1] 

1.42.10“' 

4.3210“' 

3.31-10-' 

2.1540“' 

5.8140“" 

1.1340“' 

7.98-10“" 

7.19-10“" 

[5,1] 

9.3640“" 

5.1610“' 

3.69-10-' 

1.6410“' 

4.9910“" 

1.41-10“' 

8.57-10“" 

6.50-10“" 

[7,1] 

8.9940“" 

5.4210“' 

3.84-10-' 

1.4640“' 

4.4940“" 

1.4640“' 

8.46-10“" 

5.75-10“" 

[10,1] 

7.12.10“" 

5.9410“' 

4.12-10-' 

1.1910“' 

4.3110“" 

1.6940“' 

9.26-10“" 

5.45-10“" 

[25,1] 

4.4940“" 

6.7910“' 

4.58-10-' 

6.9340“" 

3.9840“" 

2.1610“' 

1.09-10“' 

4.66-10“" 

[50,1] 

2.62.10“" 

7.2710“' 

4.82-10-' 

4.0040“" 

3.41.10“" 

2.41-10“' 

1.15-10“' 

3.81-10“" 

[100,1] 

1.5040“" 

7.65 40“' 

5.03-10-' 

2.24.10“" 

2.94.10“" 

2.8140“' 

1.29-10“' 

3.19-10“" 
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related to AdaBoostDB entails differences in computing time which should 
be quantified. In this section we will comparatively evaluate these aspects. 

It is important to highlight that Cost-Sensitive AdaBoost has been shown 
to outperform other previous asymmetric approaches in the literature, as 
can be seen in 0 and [l^. As a consequence, in order to avoid redundant 
experiments already published in other works, we have focused our efforts 
on comparing our method with Cost-Sensitive AdaBoost and demonstrate 
that, thought very different in computational burden, both algorithms are 
equivalent in classification performance. We encourage the reader to consult 
171 and [l^ to deepen the comparison with other algorithms, since, as we will 
see, classihcation performance differences between Cost-Sensitive AdaBoost 
and AdaBoostDB, only due to numerical errors, are negligible. 


4.4-1- Classification Performance 

As we have just commented, though theoretically equivalent, classihers 
obtained from AdaBoostDB and Cost-Sensitive AdaBoost tend to differ due 
to numerical errors related to the different model (polynomial vs. hyperbolic) 
adopted in each case. In section 14.21 we have seen that differences in the 
Bayes scenario are negligible. To further test the relevance of this difference, 
we have used again the same datasets, cost combinations and 3-fold cross- 
validation strategy used in the last section, now applied to Cost-Sensitive 
AdaBoost. 

The mean error between the two alternatives is tabulated in Table [H and, 
as can be seen, is only the order of hundredths for the worst case. To make 
a more visual interpretation of this differences, we have also computed the 
mean and standard deviation across all the datasets, of the Normalized Ex¬ 
pected Cost (the more accurate single measure of asymmetric performance) 
for every trained cost-combination. The result can be seen in Figure [3, where 
differences are in the range of thousandths. As we could expect, classihcation 
performance differences are again negligible in all cases. 


4.4.2. Computation Time 

The next item of our empirical comparison is quantifying, in terms of 
time and number of evaluated zeros, the accelerating power of AdaBoostDB 
respect to Cost-Sensitive Boosting. For this task, we have recorded the 
time consumed to train all the classihers used in the previous tests for 
AdaBoostDB and Cost-Sensitive AdaBoost, plus one more variation: Ad¬ 
aBoostDB is also computed without the Conditional Search, in order to 
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Table 2: Mean error between AdaBoostDB and Cost-Sensitive Boosting. It has been 
computed across the 3 cross-validation training and test sets of every dataset and all 
trained rounds._ 


Cost 

Bayes 

TwoClouds 1 

FN 

FP 

CE 

NEC 

FN 

FP 

CE 

NEC 

[1,100] 

6.4840-'' 

6.02.10-1 

2.99.10-1 

5.65.10-1 

0 

0 

0 

0 

[1,50] 

7.23-10-^ 

8.5310-1 

3.34.10-1 

7.54.10-1 

4.82.10-1 

4.02.10-1 

2.21.10-1 

2.9910-1 

[1,25] 

2.56-10-^ 

5.52.10-1 

1.51.10-1 

6.0210-1 

0 

0 

0 

0 

[1,10] 

3.46'10-^ 

6.53.10-1 

1.61.10-1 

4.15.10-1 

8.39.10-1 

4.13.10-1 

2.1310-1 

3.0010-1 

[1,7] 

4.82.10-^ 

5.02.10-1 

2.16.10-1 

3.1410-1 

5.82.10-1 

7.32.10-1 

2.59.10-1 

2.47.10-1 

[1,5] 

4.27-10-^ 

1.05.10-1 

1.66.10-1 

6.86.10-1 

3.43.10-1 

1.57.10-1 

2.16.10-1 

1.6410-1 

[1,3] 

2.7640“^ 

6.5310-1 

1.05.10-1 

4.52.10-1 

1.91.10-1 

2.83.10-1 

8.3910-1 

3.31.10-1 

[1,2] 

2.8140-^ 

1.4610-1 

6.78.10-1 

3.3510-1 

2.35.10-1 

8.15.10-1 

7.91.10-1 

3.27.10-1 

[2,3] 

4.02.10-1 

0 

2.01.10-1 

1.61.10-1 

4.65.10-1 

3.6910-1 

1.35.10-1 

1.41.10-1 

[1,1] 

0 

0 

0 

0 

6.67.10-1 

7.54.10-1 

1.41.10-1 

1.41.10-1 

[3,2] 

3.01.10-1 

1.51.10-1 

7.53.10-1 

6.02.10-1 

7.22.10-1 

6.11.10-1 

1.69.10-1 

2.37.10-1 

[2,1] 

1.41.10-1 

1.2610-1 

1.26.10-1 

5.19.10-1 

3.5410-1 

4.61.10-1 

1.24.10-1 

1.3010-1 

[3,1] 

9.54.10-1 

3.82.10-1 

1.43.10-1 

6.65.10-1 

1.65.10-1 

5.08.10-1 

1.83.10-1 

9.74.10-1 

[5,1] 

1.10.10-1 

4.82.10-1 

1.91.10-1 

3.85.10-1 

9.45.10-1 

2.48.10-1 

1.38.10-1 

8.43.10-1 

[7,1] 

1.3610-1 

5.6210-1 

2.1310-1 

5.46.10-1 

1.65.10-1 

1.18.10-1 

6.73.10-1 

2.92.10-1 

[10,1] 

6.5310-1 

2.31.10-1 

1.08.10-1 

5.84.10-1 

0 

5.20.10-1 

2.60.10-1 

4.72.10-1 

[25,1] 

6.5310-1 

3.41.10-1 

1.38.10-1 

5.12.10-1 

4.02.10-1 

2.01.10-1 

1.00.10-1 

3.78.10-1 

[50.1] 

8.53.10-1 

4.3710-1 

1.76.10-1 

7.51.10-1 

0 

0 

0 

0 

[100,1] 

9.0410-1 

4.7710-1 

1.93.10-1 

8.47.10-1 

4.02.10-1 

2.01.10-1 

1.00.10-1 

3.9610-1 


Cost 

Credit 

Ionosphere | 

FN 

FP 

CE 

NEC 

FN 

FP 

CE 

NEC 

[1,100] 

0 

0 

0 

0 

4.44.10-1 

1.32.10-1 

3.3210-1 

1.3510-1 

[1,50] 

0 

0 

0 

0 

3.0410-1 

1.32.10-1 

1.90.10-1 

9.34.10-1 

[1,25] 

3.92.10-1 

0 

1.18.10-1 

1.51.10-1 

3.48.10-1 

2.65.10-1 

2.14.10-1 

1.2010-1 

[1,10] 

4.12.10-1 

1.0110-1 

1.59.10-1 

1.1510-1 

1.78.10-1 

1.19.10-1 

9.02.10-1 

9.75.10-1 

[1,7] 

1.12.10-1 

2.95.10-1 

2.1210-1 

1.9610-1 

8.89.10-1 

5.29.10-1 

3.80.10-1 

3.52.10-1 

[1,5] 

2.67.10-1 

8.42.10-1 

5.06.10-1 

4.6610-1 

4.30.10-1 

2.65.10-1 

2.66.10-1 

4.9610-1 

[1,3] 

2.12.10-1 

8.92.10-1 

2.24.10-1 

3.2210-1 

1.85.10-1 

1.59.10-1 

1.38.10-1 

1.1310-1 

[1,2] 

2.3310-1 

1.48.10-1 

5.95.10-1 

5.31.10-1 

8.37.10-1 

3.70.10-1 

4.51.10-1 

1.6210-1 

[2,3] 

5.49.10-1 

2.3610-1 

1.18.10-1 

1.6210-1 

2.59.10-1 

2.38.10-1 

1.85.10-1 

2.05.10-1 

[1,1] 

0 

0 

0 

0 

0 

0 

0 

0 

[3,2] 

5.88.10-1 

7.15.10-1 

4.53.10-1 

3.87.10-1 

1.41.10-1 

2.51.10-1 

8.55.10-1 

9.27.10-1 

[2,1] 

7.65.10-1 

9.4310-1 

4.89.10-1 

2.70.10-1 

5.9310-1 

2.25.10-1 

9.97.10-1 

9.47.10-1 

[3,1] 

9.02.10-1 

2.2310-1 

1.51.10-1 

5.87.10-1 

5.9310-1 

1.06.10-1 

6.65.10-1 

5.98.10-1 

[5,1] 

4.51.10-1 

1.0010-1 

6.24.10-1 

3.31.10-1 

8.89.10-1 

2.51.10-1 

1.28.10-1 

1.07.10-1 

[7,1] 

3.3310-1 

8.42.10-1 

6.3010-1 

3.1910-1 

1.48.10-1 

2.12.10-1 

6.65.10-1 

1.2010-1 

[10,1] 

0 

1.2610-1 

8.83.10-1 

1.15.10-1 

6.67.10-1 

1.72.10-1 

2.85.10-1 

4.74.10-1 

[25,1] 

3.1410-1 

5.3910-1 

4.36.10-1 

3.1610-1 

8.89.10-1 

1.19.10-1 

8.07.10-1 

8.80.10-1 

[50,1] 

0 

3.7910-1 

2.65.10-1 

7.43.10-1 

9.6310-1 

1.85.10-1 

6.17.10-1 

9.1310-1 

[100,1] 

0 

1.6810-1 

1.18.10-1 

1.6710-1 

9.6310-1 

1.85.10-1 

7.12.10-1 

9.40.10-1 


Cost 

Diabetes 

Spam 1 

FN 

FP 

CE 

NEC 

FN 

FP 

CE 

NEC 

[1,100] 

7.49.10-1 

0 

2.61.10-1 

7.42.10-“ 

2.61.10-1 

2.96.10-1 

1.48.10-1 

2.77.10-1 

[1,50] 

8.9310-1 

0 

3.1210-1 

1.75.10-1 

2.30.10-1 

3.4310-1 

1.33.10-1 

3.1510-1 

[1,25] 

1.15.10-1 

2.4710-1 

2.82.10-1 

2.2010-1 

2.38.10-1 

3.1910-1 

1.3910-1 

2.78.10-1 

[1,10] 

1.87.10-1 

6.33 10-1 

7.24.10-1 

5.78.10-1 

1.47.10-1 

4.52.10-1 

8.28.10-1 

3.7010-1 

[1,7] 

4.35.10-1 

1.0810-1 

8.75.10-1 

4.71.10-1 

1.42.10-1 

4.58.10-1 

7.91.10-1 

3.11.10-1 

[1,5] 

4.64.10-1 

6.02.10-1 

1.29.10-1 

5.47.10-1 

1.6410-1 

5.18.10-1 

9.06.10-1 

4.01.10-1 

[1,3] 

4.61.10-1 

9.2710-1 

1.61.10-1 

1.15.10-1 

1.2910-1 

7.07.10-1 

6.70.10-1 

4.41.10-1 

[1,2] 

8.9310-1 

6.02.10-1 

6.03.10-1 

5.9910-1 

9.44.10-1 

3.1710-1 

5.16.10-1 

2.54.10-1 

[2,3] 

1.61.10-1 

9.42.10-1 

5.93.10-1 

6.2410-1 

8.19.10-1 

3.44.10-1 

4.98.10-1 

3.54.10-1 

[1,1] 

0 

0 

0 

0 

0 

0 

0 

0 

[3,2] 

1.15.10-1 

1.7010-1 

7.04.10-1 

6.3010-1 

1.08.10-1 

2.64.10-1 

7.79.10-1 

7.9010-1 

[2,1] 

3.2310-1 

2.9710-1 

9.05.10-1 

1.2310-1 

5.75.10-1 

5.95.10-1 

2.90.10-1 

3.12.10-1 

[3,1] 

2.30.10-1 

3.82.10-1 

1.90.10-1 

1.3810-1 

7.97.10-1 

1.48.10-1 

4.24.10-1 

3.8310-1 

[5,1] 

1.61.10-1 

2.7310-1 

1.38.10-1 

9.40.10-1 

6.89.10-1 

8.06.10-1 

5.40.10-1 

6.2010-1 

[7,1] 

1.07.10-1 

3.4910-1 

1.98.10-1 

5.9310-1 

4.6310-1 

1.43.10-1 

5.68.10-1 

3.9810-1 

[10,1] 

8.93.10-1 

3.88.10-1 

2.43.10-1 

6.3010-1 

3.46.10-1 

1.71.10-1 

6.05.10-1 

2.82.10-1 

[25,1] 

4.32.10-1 

2.8710-1 

1.72.10-1 

3.1310-1 

3.3810-1 

1.57.10-1 

5.81.10-1 

3.0710-1 

[50,1] 

1.7310-1 

2.8010-1 

1.82.10-1 

1.87.10-1 

1.89.10-1 

1.38.10-1 

5.67.10-1 

1.9010-1 

[100,1] 

5.76.10-1 

2.32.10-1 

1.49.10-1 

6.9010-1 

2.47.10-1 

1.90.10-1 

7.62.10-1 

2.45.10-1 
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Figure 7: Mean Error and Standard Deviation of the Normalized Expected Cost be¬ 
tween AdaBoostDB and Cost-Sensitive Boosting across all the datasets and for every 
cost-combination. 


evaluate how much time saving would be attributable only to the polyno¬ 
mial model, leaving apart the Conditional Search. 

Results are shown in Table [3l As can be seen in the last row of this 
table, the polynomial model, even evaluating the same number of zeros (the 
searching method is the Zeroin algorithm [2^, |2^) gets an average of 25% 
training time saving respect to the hyperbolic model in Cost-Sensitive Boost¬ 
ing. On the other hand, the Conditional Search method achieves a reduction 
over 99.5% on the total number of evaluated zeros, driving the full version of 
AdaBoostDB to consume only 0.49% of the time on average used by Cost- 
Sensitive Boosting. That is, it is more than 200 times faster. 


4-5. Real-world dataset 

As last experiment we have trained, with AdaBoostDB as learning algo¬ 
rithm, a simple mono-stage face detector using Haar-like features jsj , a kind 
of real-world asymmetric problem in which boosting is commonly used. For 
this purpose we have used a balanced subset (i.e. with the same number of 
positive and negative samples) from the CBCL training face and non-face 
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Table 3: Training computational burden (number of zero searches and elapsed time in 
seconds) of Cost-Sensitive AdaBoost [CS], AdaBoostDB without conditional search [DBN], 


and AdaBoostDB (with conditional search) [DB] over the synthetic and UCI sets. 


Cost 

Method 

Bayes 

Two Clouds 

Credit 

Ionosphere 

Diabetes 

Spam 

Zeros 

Time 

Zeros 

Time 

Zeros 

Time 

Zeros 

Time 

Zeros 

Time 

Zeros 

Time 


CS 

864528 

579.76 

3416944 

2204.93 

42607 

27.52 

317550 

206.82 

132992 

84,76 

9637557 

6123,66 

[1,100] 

DBN 

864528 

462.20 

3416944 

1606.21 

42607 

19.61 

317550 

154.51 

132992 

62.02 

9637557 

4547.29 


DB 

3244 

1.96 

3575 

3.86 

606 

0.43 

698 

0.59 

647 

0.44 

23448 

20.50 


CS 

864528 

583.36 

3416944 

2183.68 

42607 

27.28 

317550 

206.94 

132992 

85.49 

9637557 

6012.60 

[1,50] 

DBN 

864528 

450.55 

3416944 

1570.13 

42607 

19.76 

317550 

149.75 

132992 

60.79 

9637557 

4305.86 


DB 

3626 

2.27 

3601 

3.97 

598 

0.45 

734 

0.64 

667 

0.47 

21641 

19.65 


CS 

864528 

570.09 

3416944 

2219.51 

42607 

27.02 

317550 

206,49 

132992 

89,05 

9637557 

6001,78 

[1,25] 

DBN 

864528 

452.93 

3416944 

1537.65 

42607 

19.26 

317550 

151.34 

132992 

62.18 

9637557 

4226.14 


DB 

3900 

2.69 

3601 

3.97 

635 

0.46 

660 

0.55 

655 

0.45 

19305 

18.37 


CS 

864528 

571.30 

3416944 

2181.71 

42607 

26.99 

317550 

206,37 

132992 

84,97 

9637557 

5983,12 

[1,10] 

DBN 

864528 

462.02 

3416944 

1510.47 

42607 

18.68 

317550 

161.55 

132992 

58.62 

9637557 

4122.24 


DB 

4496 

2.91 

3445 

3.79 

623 

0.44 

579 

0.53 

664 

0.44 

15804 

16.21 


CS 

864528 

563.62 

3416944 

2196.89 

42607 

27.39 

317550 

208.32 

132992 

84,82 

9637557 

5980,80 

[1,71 

DBN 

864528 

459.41 

3416944 

1531.74 

42607 

18.64 

317550 

166.99 

132992 

59.19 

9637557 

4124.47 


DB 

4465 

2.78 

3406 

3.75 

632 

0.42 

596 

0.50 

687 

0.46 

14389 

15.30 


CS 

864528 

560.89 

3416944 

2122.42 

42607 

28.95 

317550 

208.91 

132992 

84.55 

9637557 

5955,99 

11,5] 

DBN 

864528 

467.03 

3416944 

1544.07 

42607 

20.48 

317550 

171.58 

132992 

59.41 

9637557 

4147.62 


DB 

4403 

2.69 

3360 

3.60 

616 

0.47 

544 

0.46 

661 

0.42 

13122 

14.58 


CS 

864528 

559.56 

3416944 

2106.83 

42607 

28.49 

317550 

205.09 

132992 

84.61 

9637557 

5930.07 

11,3] 

DBN 

864528 

468.71 

3416944 

1555.67 

42607 

19.86 

317550 

174.67 

132992 

60.61 

9637557 

4180.37 


DB 

4249 

2.55 

3304 

3.52 

544 

0.39 

508 

0.44 

615 

0.40 

11107 

13.42 


CS 

864528 

554.43 

3416944 

2100.07 

42607 

29.18 

317550 

207.11 

132992 

85.12 

9637557 

5899.29 

11,2] 

DBN 

864528 

477.71 

3416944 

1590.33 

42607 

20.86 

317550 

178.75 

132992 

62.52 

9637557 

4229.89 


DB 

4291 

2.63 

3170 

3.45 

543 

0.42 

473 

0.46 

645 

0.42 

10715 

13.24 


CS 

864528 

564.79 

3416944 

2098.74 

42607 

28.19 

317550 

204.90 

132992 

84.15 

9637557 

5904.34 

12,3] 

DBN 

864528 

452.25 

3416944 

1502.59 

42607 

19.72 

317550 

170.06 

132992 

60.38 

9637557 

4076.98 


DB 

3963 

2.49 

3162 

3.45 

566 

0.41 

428 

0.40 

646 

0.41 

9783 

12.75 


CS 

864528 

563.24 

3416944 

2097.62 

42607 

27.29 

317550 

204.26 

132992 

84.17 

9637557 

5877.44 

11,1] 

DBN 

864528 

518.04 

3416944 

1629.15 

42607 

19.47 

317550 

182.59 

132992 

64.14 

9637557 

4292.83 


DB 

2320 

1.61 

3331 

3.46 

492 

0.36 

421 

0.41 

642 

0.40 

9804 

12.58 


CS 

864528 

555.61 

3416944 

2099.28 

42607 

27.00 

317550 

205,13 

132992 

84,52 

9637557 

5913,31 

13,2] 

DBN 

864528 

448.32 

3416944 

1502.94 

42607 

18.67 

317550 

171.57 

132992 

59.71 

9637557 

4077.99 


DB 

1617 

1.29 

3397 

3.56 

531 

0.39 

410 

0.39 

590 

0.38 

9164 

12.44 


CS 

864528 

555.43 

3416944 

2102.39 

42607 

27.00 

317550 

204,46 

132992 

84,13 

9637557 

5912,66 

12,1] 

DBN 

864528 

483.04 

3416944 

1592.71 

42607 

18.98 

317550 

182.45 

132992 

62.60 

9637557 

4232.63 


DB 

1590 

1.30 

3392 

3.55 

545 

0.40 

382 

0.39 

633 

0.41 

8896 

12.32 


CS 

864528 

558.94 

3416944 

2109.63 

42607 

27.05 

317550 

213,56 

132992 

86.65 

9637557 

5957.71 

13,1] 

DBN 

864528 

475.44 

3416944 

1555.26 

42607 

19.20 

317550 

184.76 

132992 

63.04 

9637557 

4179.39 


DB 

1082 

1.07 

3469 

3.59 

472 

0.36 

432 

0.45 

600 

0.40 

8744 

12.24 


CS 

864528 

560.19 

3416944 

2110.52 

42607 

27.08 

317550 

206.32 

132992 

84,59 

9637557 

5966,70 

15,1] 

DBN 

864528 

469.49 

3416944 

1515.05 

42607 

18.81 

317550 

178.64 

132992 

60.76 

9637557 

4145.01 


DB 

879 

0.94 

3569 

3.65 

479 

0.38 

392 

0.38 

508 

0.35 

8608 

12.22 


CS 

864528 

565.02 

3416944 

2119.74 

42607 

27.23 

317550 

208.69 

132992 

85,38 

9637557 

6004.51 

17,1] 

DBN 

864528 

461.46 

3416944 

1482.16 

42607 

18.74 

317550 

180.49 

132992 

59.88 

9637557 

4123.34 


DB 

1080 

1.20 

3546 

3.66 

470 

0.34 

382 

0.38 

544 

0.38 

8643 

12.29 


CS 

864528 

572.12 

3416944 

2118.36 

42607 

26.94 

317550 

213.33 

132992 

85,57 

9637557 

6005,74 

[10,1] 

DBN 

864528 

460.56 

3416944 

1461.84 

42607 

18.75 

317550 

176.00 

132992 

59.43 

9637557 

4126,60 


DB 

1270 

1.18 

3622 

3.73 

435 

0.33 

459 

0.45 

547 

0.38 

9607 

12.90 


CS 

864528 

575.71 

3416944 

2121.00 

42607 

27.25 

317550 

214.90 

132992 

85.68 

9637557 

6025.70 

[25,1] 

DBN 

864528 

457.59 

3416944 

1497.55 

42607 

19.11 

317550 

176.45 

132992 

60.98 

9637557 

4229.99 


DB 

1523 

1.35 

3668 

3.84 

500 

0.41 

505 

0.47 

594 

0.42 

10683 

13.67 


CS 

864528 

578.59 

3416944 

2122.83 

42607 

27.13 

317550 

212.13 

132992 

85.56 

9637557 

6046.18 

[50,1] 

DBN 

864528 

470.39 

3416944 

1527.57 

42607 

19.49 

317550 

171.24 

132992 

61.58 

9637557 

4321.96 


DB 

1439 

1.32 

3662 

3.86 

486 

0.37 

500 

0.51 

599 

0.42 

11619 

14.20 


CS 

864528 

601.96 

3416944 

2126.83 

42607 

27.26 

317550 

215.27 

132992 

85,95 

9637557 

6060,87 

[100,1] 

DBN 

864528 

500.53 

3416944 

1565.47 

42607 

19.84 

317550 

175.26 

132992 

63.33 

9637557 

4462.43 


DB 

1375 

1.24 

3630 

3.79 

481 

0.39 

508 

0.46 

628 

0.43 

12667 

14.57 


CS^DBN 

- 

17.54% 

- 

27.76% 

- 

29.56% 

- 

17.68% 

- 

28.30% 

- 

29.42% 

Impr 

DBN^DB 

99.69% 

99.60% 

99.90% 

99.76% 

98.73% 

97.93% 

99.84% 

99.72% 

99.53% 

99.32% 

99.87% 

99.66% 


CS^DB 

99.69% 

99.67% 

99.90% 

99.83% 

98.73% 

98.54% 

99.84% 

99.78% 

99.53% 

99.51% 

99.87% 

99.76% 
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datasets |27|, [2^ . Obtained results are shown in Table 0] confirming, once 
again, the consistent cost-sensitive behavior of the classifiers trained with 
AdaBoostDB in different scenarios. 


Table 4: AdaBoostDB asymmetric behavior (false negatives, false positives, classification 
error and normalized expected cost) for each cost combination over the CBCL example 
dataset. _ 


Cost 

CBCL 

FN 

FP 

CE 

NEC 

[1,100] 

4.40-10-1 

7.50-10-^ 

2.24-10-1 

1.18-10-2 

[1,50] 

2.81-10-1 

1.25-10-2 

1.47-10-1 

1.78-10-2 

[1,25] 

2.35-10-1 

1.08-10-2 

1.23-10-1 

1.95-10-2 

[1,10] 

2.78-10-1 

1.00-10-2 

1.44-10-1 

3.43-10-2 

[IT] 

1.72-10-1 

1.58-10-2 

9.38-10-2 

3.53-10-2 

[1,5] 

1.20-10-1 

1.00-10-2 

6.50-10-2 

2.83-10-2 

[1,3] 

1.17-10-1 

2.33-10-2 

7.00-10-2 

4.67-10-2 

[1,2] 

9.92-10-2 

2.83-10-2 

6.38-10-2 

5.19-10-2 

[2,3] 

7.08-10-2 

1.83-10-2 

4.46-10-2 

3.93-10-2 

[1,1] 

8.50-10-2 

2.25-10-2 

5.38-10-2 

5.38-10-2 

[3,2] 

6.92-10-2 

4.25-10-2 

5.58-10-2 

5.85-10-2 

[2,1] 

8.17-10-2 

2.58-10-2 

5.38-10-2 

6.31-10-2 

[3,1] 

4.25-10-2 

3.42-10-2 

3.83-10-2 

4.04-10-2 

[5,1] 

9.08-10-2 

2.92-10-2 

6.00-10-2 

8.06-10-2 

[7,1] 

4.17-10-2 

6.58-10-2 

5.38-10-2 

4.47-10-2 

[10,1] 

4.25-10-2 

5.00-10-2 

4.63-10-2 

4.32-10-2 

[25,1] 

3.33-10-2 

7.17-10-2 

5.25-10-2 

3.48-10-2 

[50,1] 

2.08-10-2 

1.29-10-1 

7.50-10-2 

2.30-10-2 

[100,1] 

3.92-10-2 

1.65-10-1 

1.02-10-1 

4.04-10-2 


5. Conclusions 


In this paper we have presented, derived and empirically tested a new 
cost-sensitive AdaBoost scheme, AdaBoostDB, based on double-base ex¬ 
ponential error bounds. Sharing an equivalent theoretical root with Cost- 


Sensitive Boosting [17| and opposed to the most of other asymmetric ap¬ 


proaches in the literature, AdaBoostDB is supported by a full theoretical 
derivation that makes it possible preserve all the formal guarantees of the 
original AdaBoost for a general asymmetric scenario. 

Our approach is based on three basic mainstays: the double-base per¬ 
fective, a derivation scheme based on the generalized boosting framework 
^ (instead of the Statistical View of Boosting used in [l^) and a poly¬ 
nomial model for the problem (opposed to the hyperbolic one proposed in 
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0 ). These distinctive features, as a whole, also enable a Conditional Search 
method to increase compactness, ease and efficiency of the algorithm. As a 
consequence, AdaBoostDB training consumes only 0.49% of the time on av¬ 
erage needed by Cost-Sensitive AdaBoost to reach the same solution. This 
computational advantage (200 times faster) can make a difference in appli¬ 
cations coping with a huge number (hundreds of thousands, even millions) 
of weak hypothesis, as object detection in computer vision. 

From this point, next steps of our research will require a thorough com¬ 
parison between AdaBoostDB/Cost-Sensitive AdaBoost and AdaBoost with 
asymmetric weight initialization [l^ (the other fully-theoretical asymmetric 
boosting model in the literature) in order to clarify, both theoretically and 
practically, the different properties, advantages and disadvantages of each 
asymmetry model. 
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Algorithm 2 AdaBoostDB 


Input: 

Training set of n examples: (xi, t/j), where yi = 

Distribution of associated weights: D(i) 

Pool of F weak classifiers: h^(x) 

Cost parameters: Cp, Cn 
Number of rounds: T 


1 if 1 < 2 < m, 
— 1 if m < 2 < n. 


Initialize: 


Weight subdistributions: 


Dp(i) - 

Dm(() = 


if 1 < 2 < m, 
if m < 2 < n. 


Accumulators: Ap = 1, = 1- 

for t = 1 to T do 
Initialize: 

Minimum root: r = 1 
Minimum root vector: r = (2, 2) 

Scalar product: s = 1 

Update accumulators: < 

[ Aiv = AivEi-^ivU). 


Normalize weight subdistributions: 


- Eifrohi) 

DNii)= .^n 


E?=m+1 ^N(i) ■ 


Calculate static parameters: 


^ - CpAp-\-CnAn ’ 

u — CjyAjsj- 

CpAp+C^v^A ’ 


for / = 1 to F do 

Calculate variable parameters: / ui 

Calculate current classifier vector: c = (a • spj^b • e^j) 


CONDITIONAL SEARCH 

if a ■ Spj + h ■ £N,f < ^ [Contribution Condition] then 
if c - r> s [Improvement Condition] then 

Search the only real and positive root r of the polynomial: 


(a ■ £pj)x^^P + {b ■ + fe(£jv,/ - 1) 


— 1 


+ a{£pj - 1) = 0 


Update parameters: 


r= ^T’^Cp + l^ ^Cp+Cjv _j_ ^C^p+C'iv 


Keep hf(i) as round t solution. 


Calculate goodness parameter: cat = log (r) 


Update weights subdistributions: 


Dp{i) = Dp( 2 ) exp(—(7pat/2t(2)), 
Djvd) = -D]v(*) exp(CMOitht(i)). 


end for 

Final Classifier: 

H{x) = sign oitht{x)^ 






